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Abstract 


Outliers in assessments are often treated as a nuisance for data analysis; however, they can also 
assist in quality assurance. Their frequency can suggest problems with form codes, scanning 
accuracy, ability of examinees to enter responses as they intend, or exposure of items. 
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Outliers are often encountered in educational assessments. They can be used in a program 
of quality assurance to detect unusual results that suggest gross errors in test administration 
such as mistakes in form codes or scanning problems. Outliers may also have potential to detect 
problems that involve item disclosure or errors of examinees in data entry. Two types of outliers 
are readily considered. The first type is an unusual score on an examination. The second type is 
an unusual deviation from the score predicted by a regression of an examination subscore on other 
examination scores. Analysis of outliers in assessments is typically made more complicated by the 
large number of examinees. Some outliers are expected with virtually any reasonable definition 
of outliers; however, the fraction of observations that are outliers should be small. To investigate 
potential for outlier analysis, data were examined from an ETS assessment. The methods of 
analysis are primarily designed for use with conventional tests that are not adaptive and that are 
scored by adding up raw item scores to provide a total score. The analysis is not concerned with 
subsequent procedures to equate, link, and scale scores. In section 1, the basic methodology is 
discussed. In section 2, an application is made to the data under study. 

1 Methodology 

The basic methodology involved is quite simple. One has n examinees numbered from 1 to n 
and test sections numbered from 1 to q < n — 2, where q > 2. The test sections do not overlap. 
These sections may be conventional sections of a test or sections based on the format of the answer 
sheet. For instance, in the example under study, there are four sections based on content areas 
covered by the test, and there are three columns on the answer sheet. The section scores may be 
quite relevant in a study of whether some examinees have remarkably high or low scores in some 
content area. The column scores may be important if scanning errors or gridding errors are of 
concern. 

For each test section j, 1 < j < q. and examinee i, a section score Xij is available, and 
the total score Yi for examinee i is the sum of the q section scores for that examinee. Let the 
q -dimensional vector X,; have coordinates Xij , 1 < j < q. It is assumed that the Xj, 1 < i < n, 
are mutually independent and identically distributed, and it is assumed that the X^ have finite 
means and variances. The expectation of Xjj is fij , and the covariance of Xij and X^ is 7 j^. Let 
T be the q by q symmetric covariance matrix of X,;, so that row j and column k of T is equal 
to 7 j\~. It is assumed that section scores are not trivially related, so that T is positive definite 
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and the correlation coefficient pj k = 'IjkKljjlkk) 1 ^ 2 of X % j and X^ is defined. Analysis uses the 
basic summary statistics that include the section average Xj = n~ 1 Ya =i ^7 , which estimate the 
population mean pj and the sample covariance 

n 

C jk = (n - I)’ 1 ~ Xj)(X ik ~ X k ), 

i= 1 


which estimates 7 ^. 


1.1 Section Residuals 


When the means p 3 and covariances 7 j k are known, then, for each section score X t j, a best 
linear predictor is easily found for prediction of Xij by the other section scores X lk . k^j. One 
has the predictor 

X-ij — Qtj -|- J^PjkXik, 
k+i 

where 


and 


07 — / 7 ''y ( fijkl^k 

k^j 


^ ^ 7 kmbjm — 7 / k • 

m^j 


The error eij = Xij — Xij then has a mean 0 and a variance 


In the case of q = 2, 


T j = 7 jj ~ PiWi k > "• 

k+i 

Pv2 = 712 / 722 , 

<7i = (Xu — pi) — f3 12 (X i2 — p 2 ), 
P21 = 721/711 = 712/711, 

eii = {Xi 2 — p 2 ) — foiiXn — pi), 

= 711 - A2712 = 711 (! - P12), 
T 2 = 722 - /321721 = 722(1 — Pl 2 ), 
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and the correlation coefficient of en and e * 2 is easily seen to be 

712 - /3l2722 — /?2l7ll + /3l2/32l7l2 

(tit 2 ) 1/2 Pl2 

In like manner, for q > 2, let j be a positive integer not greater than q and let M be a 
nonempty subset of positive integers not greater than q such that j is not a member of M. Let 
e ij\M be the error from a linear regression of the section score X^j on the section scores X im , m in 
M. Under the assumption that T is positive definite, e^i m bas a positive variance If k is a 

positive integer not greater than q, if j ^ k, and if k is not in M , then one may also consider the 
error 6mm from a linear regression of the section score X^ on the section scores Xj m , m in M. 
The variance 'JkklM °f <7fc is positive. The partial covariance of X t] and Xik given X , Lm , m in M, is 
ljk\Mi the covariance of e t] and e^, and the partial correlation of Xij and X^ given X im , m in 
M, is the correlation 

_ ljk\M 

P i k \ M (' ~ljj\Mlkk\M) lpl 

of €ij\M an d €ik\M- Of special interest is the special case of M equal to the set of positive integers 
m < q such that m is neither j nor k. In this case, 

(3jk = ljk\M/lkk\M 

is the partial regression of X VJ on X^ given Xj m , m in M, 

^ij Uj | M Pjk^ik\M i 

Pkj = ljk\Mhjj\M 

is the partial regression of X^ on X t j given X im , m in M, 

77.: ^ik\M fikj kj,j | M 3 

T j ljj\M — Pjkljk\M 7jj|m(1 — Pjk\\l)i 
T k = lkk\M - Pkjlkj\M = 7fcfc|M(l “ P%\m)i 

and the correlation coefficient of e t] and e.ik is ~Pjk\M (Lord & Novick, 1968, pp. 264-269). For 
instance, if q = 4, j = 1, and k = 3, then M = {2,4} and Pjk\M is the partial correlation of Xu 
and X *3 given X ? ; 2 and X * 4 . 
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The standardized error dij = e ij/rj has a mean of 0 and a variance of 1. If the vectors X; 
have multivariate normal distributions, then the errors e*j, 1 < j < q, have a joint multivariate 
normal distribution. For each j , the mean of etj is 0 and the variance is tJ, and t- L j is independent 
of Xik, k ^ j. It follows that the dij, 1 < j < q, have a joint multivariate normal distribution 
with zero mean. For each j, dij has variance 1, so that d l3 has a standard normal distribution. 
The covariance and correlation pjkd of dij and dik are ~Pjk for q = 2 and are ~Pjk\M for Q > 2 
and M the set of positive integers not greater than q and not equal to j or k. Values of dij 
that are unusually large for a standard normal random variable thus suggest some deviation 
from the assumption of multivariate normality of the vector X, of section scores. The source of 
the deviation is not generally evident, but the unusually large standardized error suggests that 
investigation is in order. 

In practice, the p 3 and 'fjk are unknown, and they must be estimated by use of the sample 
statistics Xj and Cjk- In a standard linear regression analysis, for each section j, the score Xij 
of examinee i, 1 < i < n, is predicted by the remaining section scores Xik, k / j, by use of the 
least-squares prediction 

Xij cij T ^ ' bjkXik- 
Mi 

Here a,j and bjk are selected so that the sum of squares Sj = Y17=i(Xij ~ Xij) 2 is minimized. Thus 

(ij — Xj ^ ' bjkXk 

and 

^ ^ Ckmbjm = Cjk 

for k 7 ^ j. In addition, 

Sj/( n — 1) = Cjj — ^2 bjkCjk- 
k+j 

The raw residuals = X t] — X^ can be used to find unusual differences between observed 
section scores X VJ and predicted section scores Xij. As the sample size n becomes large, analysis 
can exploit the well-known limit results that Xj converges with probability 1 to pj and Cjk 
converges with probability 1 to 7 ^, so that a,j converges with probability 1 to ay and bjk converges 
with probability 1 to (3jk ■ The mean squared error Sj = Sj/(n — q — 1) then converges with 
probability 1 to Tjy so that, for the standardized residual Uij = eij/sj, the difference Uij — dij 
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converges with probability 1 to 0 for each fixed examinee i. It follows that u t j then has an 
approximate standard normal distribution, so that large values of u t] can suggest deviations from 
multivariate normality. 

With a bit more effort, a variation on the standardized residual Uij, the externally studentized 
residual r t j (Draper & Smith, 1998, p. 208), can be used, which has the property that r l] has a 
Student t distribution on n — q — 1 degrees of freedom if the multivariate normality assumption 
holds. It remains true that — dij converges to 0 with probability 1 as the sample size n becomes 
large; however, the exact distributional result may be helpful in samples of modest size, and 
calculation of r t j is quite straightforward with standard software. 

The definition of the externally studentized residuals involves estimation of the error in 
prediction of Xjj that results from use of regression coefficients computed by use of data from 
all observed examinees except for examinee i. Let dif be 1 for i = f and 0 otherwise. Consider 
minimization of the sum of squares 

n 

%> = T,( x u - -b>«>) 2 . 

/=1 


where 


Xfj(i) a j(i ) + bjk(i)Xfk + VijSfi. 

Because 5fi is only nonzero for / = i and the equation Xfj = Xfjt{\ is achieved for the deleted 
residual 


Vij Xij ^j{i) ^ y bjk(i)Xiki 

Mi 

it follows that Sj^\ is the residual sum of squares from a regression of the score total Xfj on the 
score totals Xf k , k ^ j, for examinees / ^ i. In addition, Vij is the error in prediction of X^ by 
Xik, k / j, that results from use of the regression based on all examinees / / i. If h t j k , 1 < k < q, 
1 < j < q, k / j, satisfies 


for k / j and 


^ ' C'km.h'ijk Xik Xf ; . 

m^j 


9ij — 1 ^ ^ ' h ijk (X ik X k ) > 0, 

Mi 


then = Bij/gij and 


Sj(i) Sj v^ij Sj e^/ g. 
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(Draper & Smith, 1998, p. 207). Under the multivariate normality assumption, gij > 0 with 
probability 1 , Vij has mean 0 and variance r 2 ^-, the estimate s 2 ^ = S' 2 ^ /(n — q — 2 ) of r 2 is 
independent of Vij, and the externally studentized residual r t j = Vij/[s^/^gij] 1 ^ 2 has a Student t 
distribution with n — q — 2 degrees of freedom. In typical applications, n is so much larger than q 
that the t distribution is very close to a standard normal distribution. 

In quality assurance, externally studentized residuals provide a guide to examinees who merit 
investigation and a guide to assessment results that warrant investigation. For example, at an 
individual level, consider an inspection scheme in which an examinee’s responses are examined for 
possible processing errors if |r*j| > 4. The examination might involve a hand examination of the 
original answer sheet, an image of the answer sheet, or a full list of responses stored in a database. 
The examiner would seek to detect possible scanning errors, accidental omission of all or part of 
a section, or errors in gridding. Especially in cases in which the number of examinees inspected 
is close to the number expected under multivariate normality, it is quite likely that most or all 
inspections will not indicate anything noteworthy. Some examinees will necessarily score much 
higher or lower on a section than suggested by performance on other sections. Not much can really 
be done about an examinee who omits an entire section by accident, although a notification that 
this situation was observed might be more useful to the examinee than a diagnostic score report. 

Under the multivariate normal model, for a particular examinee i and section j, \rij\ > 4 with 
probability equal to the probability P(|T„_ g _ 2 | > 4) that |T n _ g _ 2 | > 4, where T n - q -2 has a t 
distribution on n — q — 2 degrees of freedom. As n — q approaches 00, P(|T r) _ g _ 2 | > 4) decreases to 
the limit P{\Z\ >4), where Z is a standard normal random variable. If $ is the standard normal 
distribution function, 

P(\Z\ > 4) = 2[1 - $(4)] = 0.0000633. 

The approach to the limit is not unusually rapid. For instance, for n — q — 2 = 1,000, the 
probability is 0.0000680. For q = 4 and n — q — 2 = 1, 000, the Bonferroni inequality (Feller, 1968, 
p. 110) bounds the probability of inspection by 0.0000680g = 0.000272. Thus inspection may be 
relatively infrequent in favorable cases. 

A summary of residual results for the complete cohort of examinees may also be of interest. 
For example, one might examine the number F of examinees i, 1 < i < n, for whom | r t j > 4 
for some j from 1 to q. The fraction p = F/n of examinees with some externally studentized 
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residual of magnitude at least 4 might be studied. If the administration size n is large relative to 
q, then p may be regarded as an estimate of the probability n that m; = maxi<j< 9 | di 3 | > 4. In 
reality, exact multivariate normality is not expected, so that n is likely to differ from the value 
based on a multivariate normality assumption, but examination of p may provide a reasonable 
method to screen the examination data for unusual behavior associated with groups of examinees. 
For example, a problem with keys, scanners, or form codes is likely to affect a large number of 
examinees. Thus a large p can suggest a more thorough study of examination results. 

1.2 Control Limits 

Examination of fluctuations of p from administration to administration may be employed to 
indicate that further investigation of the results of a particular test administration is warranted. 
In principle, such examination is a standard problem in statistical process control (Burr, 1979; 
Montgomery, 2004); however, the problem in practice is often complicated by the limited number 
of administrations for which data are available and by the fact that unusually large or small 
residuals normally appear with low probability. Three basic options can be considered. The 
options depend on the assumptions made concerning what is regarded as the normal situation. To 
discuss the available options, consider a sequence of administrations k > 1. For administration k, 
let 7Tfc be the probability that, for an examinee i from the population from which the administration 
is drawn, some standardized section error exceeds 4 in magnitude, let be the number of 
examinees observed in administration k, and let pk be the fraction of examinees in administration 
k with some externally studentized residual, which exceeds 4. Here the externally studentized 
residuals in the definition of pk are computed based only on data from administration k. If 
the sample size is sufficiently large, then the distribution of n^Pk is very close to a binomial 
distribution with sample size nk and probability Tik- Under the multivariate normality assumption, 
the Bonferroni inequality implies that 7r*, cannot exceed it* = 2q[\ — 4>(4)] = 0.00006330'. O ne may 
then obtain the simple p-chart upper control limit on pk that 

Pk < min{l, tt* + 3[7T*(1 - 7r*)/n fc ] 1/2 }. 

Provided that nkTT* is large enough for a normal approximation to be reasonable, the probability 
that pk is not within its control limit is about 1 — 4>(3) = 0.00135, the probability that a standard 
normal random variable does not exceed 3. If one is concerned about the normal approximation, 
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then the binomial approximation may be used. The upper control limit is then Lfc/n*,, where L^ 
is the smallest integer such that either Lk = rik or 1 — $(3) is no greater than the probability 
that Lk is less than a binomial random variable with sample size n& and probability ir*. The 
probability is then no greater than 1 — <h(3) that pk is not within its control limit. Whether or 
not a normal or binomial approximation is used, a lower control limit is not likely to be of much 
interest given that n* is only an upper bound. In addition, attempts to find a lower control limit 
based on ir* are likely to lead to a trivial limit of 0 in many cases. 

If one relaxes the multivariate normality assumption but assumes that the nk are constant for 
different administrations k, then an alternative approach may be used based on estimation of a 
common Let data from administrations 1 to K > 1 be used for examination of administration 
K + 2. Suppose that both unusual individual values of pk and large changes in pk — Pk-i are to be 
explored. Let 

K 

Nr = ^2 nk, 

k =1 


and let 


K 

p K = ^2 n kPk 

k =1 


be the estimate of the common iik based on the first K administrations. Then the estimated 
standard deviation of pk +2 ~ Pk based on the first K administrations and the sample size tik +2 is 


s p k = \Pk(1 ~Pk)(N k 1 +n K 1 +2 )} 1/2 , 


so that pk +2 has control limits px — 3s p k and px + 3 s p k- Similarly, the estimated standard 
deviation of pk +2 — Pk+i based on the first K administrations and the sample sizes nx+i and 
nK+2 is 

s P Kd = \Pk{ 1 - PK){n]2 + 1 + n~^ +2 )] l/ ' 2 , 

so that pk +2 ~ Pk+i has control limits —3 s p Kd and 3 s p Kd- If «/<+ 1 , tik+ 2 , and Nk are all 
sufficiently large for normal approximations to apply, then pk +2 is within its control limits with 
approximate probability 2[1 — <h(3)] = 0.00270, and pk +2 ~ Pk+i is within its control limits with 
the same approximate probability. 

Because typical variations in the distribution of X, for different administrations may result in 
appreciable variation in for different administrations, consider the following variation on the 
traditional XmR chart that uses plots over time of both individual measurements and changes 



between successive individual measurements. Assume data have been gathered to date from 
administrations 1 to K > 2 and control limits are to be placed on administration K + 2. The nk 
are regarded as random variables, and the pk for k > 1 are assumed to behave as a white-noise 
time series, so that the pk are independent normal random variables with positive common mean 
v and positive common variance v. Let 

I< 

pk = k ^ 1 

fc=i 

be the sample mean of the proportions pk for k from 1 to I\, let 

I< 

s 2 k = ( K - ir 1 J2( pk ~ pk ) 2 

k=1 

be the sample variance of the pk, 1 < k < K, and let sr, the square root of s 2 K be the sample 
standard deviation of the pk- 1 < k < K. Consider construction of control limits for pr+ 2 - Let 
t-K-i be selected so that the probability that a random variable with a t distribution on K — 1 
degrees of freedom is less than tx-i is 0.00135, the probability that a standard normal distribution 
is less than 3. Because pr +2 ~Pr has variance (K + 1 )v/K and pr +2 ~ Pr +i has variance 2v, the 
control limits for px +2 are between px — [(K + 1)/K] lp2 tR-i-SR and Pr + \{K + 1 )/K] l / 2 tR-\SR. 
In like fashion, control limits for pa '+2 ~ Pk+i are between —2 lp2 tR-iSR and 2 1 / 2 tR-\SR. Under 
the white-noise model, the probability is 2[1 — d>(3)] = 0.00270 that pr +2 is within its control 
limit, and the probability is also 2[1 — <L(3)] = 0.00270 that pxyi — Pr +i is within its control 
limit. For K large, the outlined procedure approaches the customary results from an XmR chart. 

1.3 Residuals for the Total Score 

The approach using externally studentized residuals can also be applied to the total score 
Y t by itself; however, the analysis is much simpler. One considers prediction of Yi by a constant 
/j. The optimal choice of p for best linear prediction has p equal to the expectation of Y t . The 
prediction error e = Yi — p, and the standardized error is dj = Ci/o , where a is the standard 
deviation of Y t . The mean-squared error of prediction of Y t is then the variance cr of Yi. Let 
Y = n~ l Yi be the sample mean of the Yi. The estimated prediction Yi of Yi is Y for each 
examinee i. The residual mean-squared error is the sample variance 

n 

s 2 = (n-lT 1 J2(Yi-Y) 2 - 

i =1 
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For examinee i, the residual e* = Yj — Y , the standardized residual is Uj = ej/s , the deleted 
residual is Vj = nei/{n — 1), and the externally studentized residual is then 

[n/(n - l)] 1 / 2 e; 

n ~ {(n - 2 )-![(« - l)s 2 - e 2 n/(n - l)]} 1 / 2 ' 

The multivariate normality assumption for the section scores implies that rj has a t distribution 
with n — 2 degrees of freedom. For individuals, unusually large or small values of ?’* may suggest 
problems with form codes, gridding, or scanning, so that detailed examination of the examinee 
record can be warranted. An unusual fraction p' of examinees with |r«| > 4 may suggest more 
general problems with the administration. The procedures for control limits for p are readily 
changed for control limits for p'. The main change is that the multivariate normality case implies 
that p' has expectation approximately equal to 2[1 — 4>(4)]. 

1.4 Low Scores 

A second screening method may also be helpful, especially in the case of errors in recording 
form codes or using answer keys. One can determine the expected total score po obtained by 
an examinee who answers all items randomly. For example, in an examination with 50 items 
that is right-scored and has five alternatives for each item, the expected total score for a random 
responder is 10. One might consider examination of all answer sheets for examinees with an 
observed score less than G. In cases in which this number is large, a random sample of such answer 
sheets might be considered. In addition to verification of form codes, one might again check by 
hand for scanning errors, A simple screen is to record the number G of examinees i, 1 < i < n, 
with Yj no greater than //q . Again examination of fluctuations of the fraction g = G/n from 
administration to administration can be used to indicate that further investigation is needed. The 
control limits previously described can be applied to this case as well, although the multivariate 
normal case would not normally be considered. 

2 An Application 

A multiple-choice right-scored examination produced by ETS was examined. In this 
assessment, 120 items are divided into q = 4 sections, sections 1 to 4. For examinee i, Xjj is the 
number of correct responses in section j, and Yj = Xu + X^ + Xj% + X *4 is the total raw score. 
Each item is multiple-choice, and four choices are used in each case. The test is right-scored, 
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Table 1 

Relative Frequency of Examinees With Large Externally Studentized Subscore 
Residuals in Administrations of Examination 


Administration 

No. examinees 

Frequency 

Relative frequency 

1 

6,432 

30 

0.0047 

2 

9,087 

33 

0.0036 

3 

6,409 

25 

0.0039 

4 

9,073 

31 

0.0034 



Table 2 


Relative Frequency of Examinees 

With Very Low 

Total Scores 


Administrations of Examination 


Administration No. examinees 

Frequency Relative frequency 

1 

6,432 

2 

0.0003 

2 

9,087 

7 

0.0008 

3 

6,409 

3 

0.0005 

4 

9,073 

2 

0.0002 


so that random generation of responses leads to an expected total score of 30. To obtain some 
insight into customary variability, four administrations of the same form were analyzed separately. 
No access to the original answer sheets was available, so that some limitations on explanation of 
results necessarily exist. Note that the order of the examinations in the example is not necessarily 
the actual temporal order, so that the control limits here sometimes may differ from those that 
would be constructed in practice. 

Table 1 summarizes results for the regression analyses, while Table 2 summarizes results for 
low total scores. Administrations are numbered rather than listed by date to prevent disclosure of 
dates on which the same form was used. For these administrations, for the observed sample means 
and sample variances of the total scores, no externally studentized residual for a total score can 
exceed 4; however, it is possible for such a residual to be less than —4. Nonetheless, no instance of 
an externally studentized residual less than —4 was observed. 

The administrations do not differ markedly in terms of examinees with externally studentized 
residuals for section scores that are exceptionally small or large or in terms of examinees with 
very low scores. In the case of section scores, a Pearson chi-square test that the nk are constant 
for all administrations yields a chi-square of 1.68 on three degrees of freedom. Given the large 
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sample sizes, the problem of dependence of residuals within an administration can be assumed to 
be negligible, so that this Pearson chi-square indicates no obvious variability. The case of low 
scores involves rather low expected values under the null hypothesis of a constant rate, but the 
chi-square value of 3.40 on three degrees of freedom does not indicate any obvious variability by 
administration. 

For each administration, the rate of externally studentized subscore residuals with magnitude 
greater than 4 is much higher than expected under normality, for 7 r* is 0.000253 = 4(0.0000633), 
so that the largest control limit under the multivariate normality assumption is 

0.000253 + 3[0.000253(1 - 0.000253)/6409] 1/2 = 0.000850. 

Under the binomial approximation, the control limits range from 0.000440 to 0.000468, so the 
choice of approximations does affect the control limits. Nonetheless, the persistent failure of the 
multivariate normal model remains evident. 

The control limits for constant iTk are much less readily violated by the externally studentized 
subscore residuals. For a simple example, consider the case of K = 2. With this case, the bounds 
for p 4 are 0.00154 and 0.00867. The bounds for jq — p 3 are —0.00311 and 0.00311. These bounds 
are readily satisfied by the observed data, so that no suggestion of a fundamental change exists. 
For this example, use of the bounds that do no assume constant tt k is impractical due to the very 
large value of tx- i- For K = 2, tx -1 = 235.801 and [( K + l)/K^^tx-i = 288.797. The approach 
is much more reasonable for somewhat larger K. For example, for K = 11, tx-i = 3.957 and 
[(K + l)/K] l / 2 tx-i = 4.150. For K = 21, t K - 1 = 3.422 and [(K + l)/K] l / 2 t K -i = 3.507. Note 
that for quite large K, [(K + 1 )/K] 1 / 2 tx-i is close to 3. 

In the case of externally studentized residuals based on total score, control limits based 
on tt* are obviously not violated, for no externally studentized residual is unusually large in 
magnitude. In the case of the fraction of total scores that are not greater than the expected total 
score with random response, consideration of a constant probability of a very low score for each 
administration is complicated by the very small but positive frequency counts observed, so that 
more data would be desirable before a normal approximation was used. Again the data do not 
suffice for the case of variable probabilities for different administrations. 

Both very low scores and very large externally studentized section residuals are very strongly 
associated with omitted responses. Generally, examinees on this right-scored test answer all items. 
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Table 3 

Relative Frequency of Examinees With Large Externally Studentized Column 
Residuals in Administrations of Examination 


Administration 

No. examinees 

Frequency 

Relative frequency 

1 

6,432 

25 

0.0039 

2 

9,087 

29 

0.0032 

3 

6,409 

25 

0.0039 

4 

9,073 

25 

0.0028 


Among all 31,001 examinees, 27,876 examinees, 90.0% of the total, answered all items. Among 
these examinees, 31, or 0.111 per cent, had patterns of responses that resulted in unusually large 
externally studentized residuals for section scores, and 4, or 0.014%, had no more correct answers 
than would be expected with random response. Of the 3,125 examinees with some missing 
response, 88, or 2.816%, had unusual section residuals, and 10, or 0.320%, had very low total 
scores. Thus 73.9% of all examinees with large section residuals and 71.4% of all examinees with 
very low scores had missing responses. Results are even more striking when 30 or more responses 
were omitted. For the four administrations, 44 examinees omitted 30 or more responses. Among 
this group, 25 examinees, or 56.8%, had unusually large externally studentized residuals and 7, 
or 15.9%, had very low total scores. Thus half of examinees with very low scores and 26.1% 
of examinees with unusually large section residuals had at least 30 omissions. Among the 44 
examinees with 30 or more omitted responses were 27 who omitted the entire final section. 

A parallel analysis divided the 120 responses in three parts of 40 items each to reflect the use 
of an answer sheet in which each column contained 40 items. Results are shown in Table 3. This 
analysis does not appear to differ much from the analysis in Table 1. The reader should remember 
that some reduction in examinees with unusually large residuals is associated with a reduction of 
q from 4 to 3. For example, n* is now 0.000190 rather than 0.000253. 

3 Conclusions 

Because retrieval of original papers was not feasible, it was not possible to determine if 
unusual scores represented any problem in data processing. As described in section 1, effective 
use of outlier checks will require data collection for a significant number of administrations 
of a test title. Given these data, a quality monitoring procedure can be established to check 
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new administrations to ascertain if the rate of major outliers or very low scores is unusually 
high relative to historical records. If a rate is sufficiently notable, then an examination of the 
administration is needed to determine if a significant problem related to data collection, data 
processing, or test administration has been found. 
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